Hybrid HMM/BLSTM-RNN for Robust Speech Recognition

نویسندگان

  • Yang Sun
  • Louis ten Bosch
  • Lou Boves
چکیده

The question how to integrate information from different sources in speech decoding is still only partially solved (layered architecture versus integrated search). We investigate the optimal integration of information from Artificial Neural Nets in a speech decoding scheme based on a Dynamic Bayesian Network for noise robust ASR. A HMM implemented by the DBN cooperates with a novel Recurrent Neural Network (BLSTM-RNN), which exploits long-range context information to predict a phoneme for each MFCC frame. When using the identity of the most likely phoneme as a direct observation, such a hybrid system has proved to improve noise robustness. In this paper, we use the complete BLSTM-RNN output which is presented to the DBN as Virtual Evidence. This allows the hybrid system to use information about all phoneme candidates, which was not possible in previous experiments. Our approach improved word accuracy on the Aurora 2 Corpus by 8%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TTS synthesis with bidirectional LSTM based recurrent neural networks

Feed-forward, Deep neural networks (DNN)-based text-tospeech (TTS) systems have been recently shown to outperform decision-tree clustered context-dependent HMM TTS systems [1, 4]. However, the long time span contextual effect in a speech utterance is still not easy to accommodate, due to the intrinsic, feed-forward nature in DNN-based modeling. Also, to synthesize a smooth speech trajectory, th...

متن کامل

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection

In this study, we propose a new method of polyphonic sound event detection based on a Bidirectional Long Short-Term Memory Hidden Markov Model hybrid system (BLSTM-HMM). We extend the hybrid model of neural network and HMM, which achieved stateof-the-art performance in the field of speech recognition, to the multi-label classification problem. This extension provides an explicit duration model ...

متن کامل

Robust in-car spelling recognition - a tandem BLSTM-HMM approach

As an intuitive hands-free input modality automatic spelling recognition is especially useful for in-car human-machine interfaces. However, for today’s speech recognition engines it is extremely challenging to cope with similar sounding spelling speech sequences in the presence of noises such as the driving noise inside a car. Thus, we propose a novel Tandem spelling recogniser, combining a Hid...

متن کامل

Long short-term memory networks for noise robust speech recognition

In this paper we introduce a novel hybrid model architecture for speech recognition and investigate its noise robustness on the Aurora 2 database. Our model is composed of a bidirectional Long Short-Term Memory (BLSTM) recurrent neural net exploiting long-range context information for phoneme prediction and a Dynamic Bayesian Network (DBN) for decoding. The DBN is able to learn pronunciation va...

متن کامل

A Comparitive Survey of ANN and Hybrid HMM/ANN Architectures for Robust Speech Recognition

This paper proposes two hybrid connectionist structural acoustical models for robust context independent phone like and word like units for speaker-independent recognition system. Such structure combines strength of Hidden Markov Models (HMM) in modeling stochastic sequences and the non-linear classification capability of Artificial Neural Networks (ANN). Two kinds of Neural Networks (NN) are i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010